All Categories :
Servers
Chapter 28
Connecting the Intranet and the Internet
CONTENTS
Sooner or later, you are going to want to take advantage of what
you now know about building an internal Web to help your organization
establish a presence on the World Wide Web. Whether to help sell
your products, share research, publish news, market software,
provide customer support, or solicit customer feedback, a site
on the WWW can help your organization further its goals.
Until recently, few people thought it was possible to run a decent
Web server on Windows NT. I hope this book has convinced you,
if it has done nothing else, how untrue that myth is. In fact,
I have found Windows NT to be an excellent Web platform in terms
of performance, cost, and ease of use. And if you want to build
a Web site, without spending years learning about UNIX system
administration, this chapter can help you get started.
The aim of this chapter is to go over a few assorted topics that
you will likely come across when you connect your Intranet to
the Internet. This chapter is by no means complete-whole books
are written on the subject of how to build an Internet server.
But few of those are devoted to Windows NT Web sites. One in particular
is Web Site Construction Kit for Windows NT, written by
Christopher L. T. Brown and yours truly. That book will provide
you with a complete treatment of all aspects of building an Internet
server with Windows NT 4.
This chapter will address using NT and a modem as a router, how
to deal with Internet robots, and why you will want to consider
building a firewall between your Intranet and the Internet. Most
of the material in this chapter is not about running an Internet
Web site; rather, it is about connecting your LAN to the Internet
in the first place.
Let's say you want to connect a small Local Area Network (LAN)
to the Internet using Windows NT, Remote Access Service (RAS),
and a modem. Your Intranet customers are telling you that they
all want access to the World Wide Web for research and software
downloading, but you would like to avoid buying separate modems
and phone lines for everyone. If you use Windows NT as a router,
you only need to buy one modem, one connection to the Internet
Service Provider (ISP), and one extra phone. Of course, bandwidth
is a consideration, but you can always scale up to ISDN, Frame
Relay, or T1 as your needs dictate. Let's be very clear about
this; using a 28.8 Kbps RAS modem connection to the Internet does
not make for a high-performance router.
If the idea behind this sounds appealing to you, be prepared for
the fact that the path to success can be strewn with land mines.
This section will present a summary of the steps that I took to
make this work on my small LAN. The ideas presented will reflect
my experiences, but they are based on advice from the Windows
NT Webserver mailing list, conversations with experts (including
my ISP), and assorted books and magazine articles (listed in the
Bibliography).
Here is an overview of the various tasks that await you as you
connect your LAN to the Internet.
- You need to obtain a dial-up PPP
account with an Internet Service Provider in your area. Either
get one subnet account that has at least twice as many IP addresses
as you need for the number of client workstations on your LAN,
or get one IP address in a separate range from a subnet having
enough IP addresses to accommodate all the workstations on your
LAN. I'll discuss this in more detail later.
- You need to hook up your modem
through RAS and enable it to dial into your ISP as a stand-alone
machine. Ensure that you can browse the World Wide Web from your
NT Server.
- You need to configure your NT Server
to redial the connection in the event of a disconnect or a server
crash. You can use the SomarSoft Redial program for this.
- You need to install Network Interface
Cards (NICs) in the server and in each workstation machine. Then
select and install the appropriate network wiring between the
NICs. This step is more a fundamental part of building a LAN than
anything else. The workstations can be running Windows NT, Windows
95, or Windows for Workgroups. In the case of WfW, the Microsoft
TCP/IP add-on pack must be installed separately.
- You need to adjust the Registry on the
NT Server to function as a router.
- You need to configure the TCP/IP Address,
Subnet Mask, and Default Gateway in each of the workstations.
Ensure that the workstations can ping the NT Server.
- You need to adjust the routing
tables on the NT Server to achieve the Internet connection for
the workstations.
Once you have accomplished all this, you are ready to surf the
Web from the client computers, as well from as the server.
Windows NT Registry Settings for the Router
Based on documentation from various sources, including the Microsoft
TechNet CD, the following Registry parameters are a superset of
what is strictly required for NT to function as a router between
the LAN and the Internet. The reason I say "superset"
is that your particular situation will determine whether or not
you will need all of these. Future versions of NT may introduce
other parameters enabling further optimization. Experimentation
may be necessary to achieve the best results. All of the entries
are documented in the Windows NT Resource Kit.
Now it's time to fire up RegEdit. To avoid repeating a large portion
of the name of each value, I will tell you up front that they
are all located under the following key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services
The following values can be found underneath this Services key.
If they are not already present, you will need to add them as
follows:
- RasArp\Parameters\DisableOtherSrcPackets
= 0.
- RasMan\PPP\IPCP\PriorityBasedOnSubNetwork
= 1.
- TcpIp\Parameters\IpEnableRouter =
1.
- TcpIp\Parameters\ForwardBroadcasts
= 1. (This parameter is documented as unused on the
Microsoft TechNet CD and its benefit in this scenario is unclear.)
Assigning Default Gateway Addresses on the LAN
The next trick is to assign one IP address to your RAS dial-up
connection and a different IP address to each of the NICs on your
LAN, including the Windows NT Server. It is very likely that you
have already dealt with this issue as part of the overall configuration
of TCP/IP on your Intranet.
Once you have assigned a unique IP address to each NIC on the
LAN, set the Default Gateway on each of the client machines to
the address of the NIC on the NT router. The Default Gateway of
the NIC on the NT Server itself should be left blank. These steps
are very important, so make sure that you apply them accurately.
For reasons that I will get into in a moment, you must not try
to take a shortcut by sending the client machine packets straight
to the IP address of the RAS connection-the Default Gateway on
the clients must be the NIC on the NT Server.
It might help if we develop a mental picture of what's going on
here. Each machine on the LAN needs to be told where to send outgoing
packets so that they will find their way out to the Internet.
Otherwise, outbound packets would have no way to get off the LAN.
The NT Server with the RAS connection is the gateway (or router).
Since the NT Server is a dual-homed host (supporting a NIC and
running RAS), packets from the NICs on the LAN that are sent to
the IP address of the NIC on the NT Server will be routed by the
NT box from its NIC over to its RAS connection. This will be accomplished
within the TCP/IP software running in Windows NT. From there,
the packets will travel through the modem to the ISP and on out
to the Internet.
The Default Gateway of the RAS TCP/IP connection on the NT Server
appears to be non-configurable in NT 4, but that is okay because
RAS conveniently seems to assume that its Default Gateway is on
the network it is dialing into, namely your ISP. For example,
if you type ipconfig after
establishing a RAS connection to your ISP, you will see that the
Default Gateway is the same as the IP address assigned to RAS.
Assigning Subnets
Subnets can be a rather mysterious area of TCP/IP. The most important
point about subnetting, as relates to this discussion, is that
the IP address of the RAS connection must be in a separate subnet
of the IP address of the NIC in the NT Server. That is why the
clients cannot list the RAS IP address as the Default Gateway-because
technically they can't see it.
TCP/IP Addressing |
For all TCP/IP issues great and small I highly recommend Internetworking with TCP/IP, Volume I, Third Edition, by Douglas E. Comer as an excellent permanent reference book. There is no way I can possibly give a complete treatment of IP addressing here, so I'll just take a stab at the basics. A Class C TCP/IP subnet can contain up to 254 separate IP addresses. The addresses are of the form X.A.B.C, where X, A, and B are fixed by the ISP, and C ranges between 0 and 255 for each machine on your LAN. For a Class C address, the X in the above address will always be greater than 192. X numbers less than 128 are Class A addresses. X numbers between 128 and 191 are Class B addresses.
Each of the numbers in a TCP/IP address consists of 8 bits, providing 256 possibilities, ranging between 0 and 255 in decimal. By convention, any TCP/IP subnet reserves the address with all zeros to mean the address of the network itself, and the address with all ones to mean the broadcast address. That is why each subnet may contain two fewer computers than the size of the subnet.
|
If you can afford to buy one IP address for your RAS connection
and a complete (separate) Class C address for the rest of your
LAN, you will have a very easy time connecting your LAN to the
Internet. In that case, you will simply use a subnet mask of 255.255.255.0
for each IP address on your LAN. The subnet mask for RAS would
also be 255.255.255.0 because it would already be on a different
network segment.
If you want to perform a bit pattern subnet, things definitely
get more involved. Let's suppose you have six client computers
to connect to the Internet. You might think that you can get by
with a subnet-8 account, as leased by many ISPs. But remember,
you need one IP address for RAS, a different IP address for the
NIC in the NT Server, and the all zeros and all ones addresses
on the subnet will be skipped by convention (although in certain
cases you can violate that rule).
In other words, you need a subnet big enough to hold a total of
ten IP addresses. At this point you would think that all you need
is a subnet-16, again an easy thing to lease from an ISP. Unfortunately,
it's not even that simple. Remember, the RAS IP address must be
in a different subnet. Each bit you use for a subnet divides the
range of usable addresses in half.
You could divide the subnet-16 in half. Then you could devote
the 0 through 7 subnet numbers for RAS-skipping the zeroeth address
as the subnet address itself, you could assign the first subnet
address to RAS (addresses 2 through 7 would be wasted). You would
then give the numbers 8 through 15 in the higher subnet to the
six client machines, actually skipping 8 and 15 as per convention.
That would work except that we left out the NIC in the NT Server.
Therefore, you would either need to step up to a block of 32 addresses
or you would need to drop one client machine.
Whatever method you use, your ISP should be able to help you establish
the bit pattern for the subnet mask on the clients. If you can't
afford to go with a full Class C address for the LAN, the fourth
octet in the subnet mask must be split into a binary number
with zeroes used in the low-order bits for the number of addresses
needed. The subnet mask on the RAS connection can simply be a
Class C mask in all cases: 255.255.255.0.
I believe that this is due to the fact that RAS always behaves
as a LAN-to-WAN router, possibly allowing it to ignore the subnet
mask.
Adding Static TCP/IP Routes for the Workstations
There is just one more step to get your LAN packets to travel
out to the Internet. Actually, the purpose of this step is to
let the data packets returning from the Internet find their way
from the RAS connection on the NT Server over to the client machine
that initiated the transaction.
You need to add static routes to the route table on the NT Server.
This is done at the DOS command prompt using the route
add command.
I'll explain this by example. Suppose you have given the NIC in
the NT Server an IP address of 200.9.9.1 and the NIC address of
the first client machine is 200.9.9.2. The procedure must be duplicated
for each client machine.
The following command would provide a static route from the NT
Server to the first client:
route add -p 200.9.9.2 200.9.9.1
The -p parameter specifies
to add this as a persistent route. NT will store it in the Registry
so it will be retained after subsequent reboots.
Finally, if you find that your client machines can ping Internet
addresses by TCP/IP number only, but DNS name resolutions and
Web browsing do not work, it could be that your ISP has not provided
static routes on their end to pass network traffic into your RAS
connection when the destination address is one in your subnet.
World Wide Web robots, sometimes called wanderers or spiders,
are programs that traverse the Web automatically. A robot's job
is to retrieve information about the documents that are available
on the Web and then store that information in some kind of master
index of the Web. Usually, the robot is limited by its author
to hunt for a particular topic or segment of the Web.
At the very least, most robots are programmed to look at the <TITLE>
and <H1> tags in the
HTML documents they discover. Then they scan the contents of the
file looking for <A HREF>
tags to other documents. A typical robot might store the URLs
of those documents in a data structure called a tree, which
it uses to continue the search whenever it reaches a dead-end
(more technically called a leaf-node). I am oversimplifying
this a bit; the larger robots probably use much more sophisticated
algorithms. But the basic principles are the same.
The idea behind this is that the index built by the robot will
make life easier for us humans who would like a quick hop to information
sources on the Internet.
The good news is that most robots are successful at this and do
help make subsequent search and retrieval of those documents more
efficient. This is important in terms of Internet traffic. If
a robot spends several hours looking for documents, but thousands
(or even millions) of users take advantage of the index that is
generated, it will save all those users from tapping their own
means of discovering the links, potentially saving a great amount
of network bandwidth.
The bad news is that some robots inefficiently revisit the same
site more than once, or they submit rapid-fire requests to the
same site in such a frenzy that the server can't keep up. This
is obviously a cause of concern for Webmasters. Robot authors
are as upset as the rest of the Internet community when they find
out that a poorly behaved robot has been unleashed. But usually
such problems are found only in a few poorly written robots.
Fortunately, guidelines have been developed for robot authors,
and most robots are compliant. An excellent online resource for
information about robots, including further information on which
much of this material is based, see "World Wide Web Robots,
Wanderers, and Spiders" by Martijn Koster, http://info.webcrawler.com/mak/projects/robots/robots.html.
It contains links to documents describing robot guidelines, the
standard for robot exclusion, and an in-depth collection of information
about known robots.
Tip |
The Internet community puts up with robots because robots give something back to all of us. A private robot, on the other hand, is one that you might customize to search a limited realm of interest to you or your organization. Private robots are frowned upon because they use Internet resources, but only offer value to a single user in return. If you are looking for your own Internet robot, however, you can check out the Verity Inc. home page at http://www.verity.com/. Please remember that one of the guidelines of robot design is to first analyze carefully if a new robot is really called for.
|
A good understanding of Web robots and how to use or exclude them
will aid you in your Web ventures; in fact, it could help to keep
your server alive.
Excluding Robots
There are lots of reasons to want to exclude robots from visiting
your site. One reason is that rapid-fire requests from buggy robots
could drag your server down. Also, your site might contain data
that you do not want to be indexed by outside sources. Whatever
the reason, there is an obvious need for a method for robot exclusion.
Be aware, however, that it wouldn't be helpful to the Internet
community if all robots were excluded from all sites.
On the Internet Web-related newsgroups and listservers, you will
often see a new Web site administrator ask the question "What
is robots.txt and why are
people looking for it?" This question often comes up after
the administrator looks at his or her Web access logs and sees
the following line:
Tue Jun 06 17:36:36 1995 204.252.2.5 192.100.81.115 GET /robots.txt HTTP/1.0
Knowing that they don't have a file robots.txt
in the root directory, most administrators are puzzled.
The answer is that robots.txt
is part of the Standard for Robot Exclusion. The standard was
agreed to in June 1994 on the robots mailing list (robots-request@webcrawler.com)
by the majority of robot authors and other people with an interest
in robots. The information on these pages is based on the working
draft of the exclusion standard, which can be found at this URL:
http://info.webcrawler.com/mak/projects/robots/norobots.html
Some of the things to take into account concerning the Standard
for Robot Exclusion are:
- It is not an official standard backed by a standards body.
- It is not enforced by anybody, and there are no guarantees
that all current and future robots will adhere to it.
- Consider it a loose standard that the majority of robot authors
will follow.
In addition to using the exclusion described below, there are
a few other simple steps you can follow if you discover an unwanted
robot visiting your site:
- Check your Web server log files to detect the frequency of
document retrievals.
- Try to determine where the robot originated. This will enable
you to contact the author. You can find the author by looking
at the User-agent and From field in the request, or look up the
host domain in the list of robots.
- If the robot is annoying in some fashion, let the robot author
know about it. Ask the author to visit http://info.webcrawler.com/mak/projects/robots/robots.html
so he or she can read the guidelines for robot authors and the
standard for exclusion.
The Method
The method used to exclude robots from a server is to create a
file on the server that specifies an access policy for robots.
This file must be named robots.txt,
and it must reside in the HTML document root directory.
The file must be accessible via HTTP, with the contents as specified
here. The format and semantics of the file are as follows:
- The file consists of one or more records separated by one
or more blank lines (terminated by CR,
CR/NL,
or NL). Each record contains
lines of the form:<field name>:<optional
space><value><optional space>.The field
name is case-insensitive. Comments can be included in the file
using UNIX Bourne shell conventions. The #
character is used to indicate that preceding space (if any) and
the remainder of the line up to the line termination are discarded.
Lines containing only a comment are discarded completely and therefore
do not indicate a record boundary. The record starts with one
or more user-agent lines, followed by one or more disallow lines.
Unrecognized headers are ignored.
- User-agent The value of this field is the name of the robot
for which the record is describing an access policy. If more than
one User-agent field is present, the record describes an identical
access policy for more than one robot. At least one field needs
to be present per record. The robot should be liberal in interpreting
this field. A case-insensitive substring match of the name without
version information is recommended. If the value in the record
describes the default access policy for any robot that has not
matched any of the other records, it is not allowed to have two
such records in the robots.txt
file.
- Disallow The value of this field specifies a partial URL that
is not to be visited. This can be a full path or a partial path;
any URL that starts with this value will not be retrieved. For
example:Disallow: /helpdisallows
both /help.htm and /help/default.htm,
whereasDisallow: /help/disallows
/help/default.htm but allows
/help.htm.
Any empty value indicates that all URLs can be retrieved. At least
one Disallow field needs to be present in a record. The presence
of an empty /robots.txt file
has no explicit associated semantics; it will be treated as if
it was not present-for example, all robots will consider themselves
welcome to scan.
Examples
Here is a sample robots.txt
for http://www.yourco.com/
that specifies no robots should visit any URL starting with /yourco/cgi-bin/
or /tmp/:
User-agent: *
Disallow: /yourco/cgi-bin/
Disallow: /tmp/
Here is an example that indicates no robots should visit the current
site:
Useragent: *
Disallow: /
If you intend to maintain an Internet connection and you truly
want a secure site, you should consider getting firewall protection.
A firewall can be software, hardware, or a combination
of the two. Commercial firewall packages cost a lot more than
loose change-usually the price range is anywhere from $1,000 to
$100,000. If you are using NT, RAS, and a modem as a software-based
router, then the first step in building a firewall is to change
from a RAS-based connection to an Ethernet/router hardware-based
connection.
Note |
For a much more thorough treatment of firewalls and Internet security, please see Internet Firewalls and Network Security by Karanjit Siyan and Chris Hare, published by New Riders Publishing.
|
There aren't yet many software-only firewalls for Windows NT,
as most are based on UNIX. In the meantime, you might consider
running a freeware version of UNIX for the purpose of including
a firewall in your network.
Note |
If the cost of a firewall has you worried, consider a much cheaper and more secure solution. Namely, avoid connecting your LAN to the Internet Web server altogether. Obviously, there are drawbacks to this approach, but it is the most secure approach. For one thing, it assumes your client machines won't need a connection to browse the Web. For another, you have to use sneaker-net (hand-carried floppy disks) to modify your HTML files on the Web server. This inconvenience can be reduced by physically reconnecting the network cable to the back of the Web server machine for a limited time during the day when you need to access it.
|
A firewall usually includes several software tools. For example,
it might include separate proxy servers for e-mail, FTP, Gopher,
Telnet, Web, and WAIS. The firewall can also filter certain outbound
ICMP (Internet Control Message Protocol) packets so your server
won't be capable of divulging network information.
Figure 28.1 shows a network diagram of a typical LAN connection
to the Internet including a Web server and a firewall. Note that
the Web server, LAN server, and firewall server could all be rolled
into one machine if the budget is tight, but separating them as
shown here is considered a safer environment.
Figure 28.1: Using a firewall/proxy server on a LAN.
The proxy server is used to mask all of your LAN IP addresses
on outbound packets so they look like they all originated at the
proxy server itself. Each of the client machines on your LAN must
use the proxy server whenever they connect to the Internet for
FTP, Telnet, Gopher, or the Web. The reason for doing this is
to prevent outside detection of the structure of your network.
Otherwise, hackers monitoring your outbound traffic would eventually
be able to determine your individual IP addresses and then use
IP spoofing to feed those back to your server when they
want to appear as a known client.
Another purpose of a firewall is to perform IP filtering
of incoming packets. Let's say that you have been monitoring the
log files on your Web server and you keep noticing some unusual
or unwanted activity originating from IP address x.3.5.9.
After checking with the whois
program (available on the Internet, for example, at http://www.winsite.com),
you determine the domain name is bad.com,
and you don't have any known reason to be doing business with
them. You can configure the IP filter to block any connection
attempts originating from bad.com
while still allowing packets from the friendly good.com
to proceed.
Caution |
Many people think IP packet filtering is worthless if only implemented in software. They may advise you that packet filtering is useful only in a router or a hardware firewall solution that includes the capability to filter at the Link Layer, as opposed to the Network or Transport layers where the TCP/IP software operates. However, even if you do packet filtering via software, the trick is to filter based on both the source IP and the interface. That is, if a packet with a source address that is also an internal IP shows up on the external interface of your router (such as from the Internet side), you should drop the packet and have it logged for immediate attention.
|
This chapter has given you a brief taste of some of the issues
involved in connecting the LAN to the Internet. This has by no
means been a comprehensive treatment of the subject. I hope that
it has been enough, however, to get you started with building
your own site on the World Wide Web or in providing Internet access
for all customers who need to go beyond the Intranet.
In the next and final chapter I will continue with the idea of
building your own Web site by showing you a very handy Perl script
that can be used to collect statistics of your visitors. When
you think about it, you could even use the script on the Intranet.
The chapter also provides a quick introduction to the Perl language
and discusses other applications to which a Webmaster can apply
Perl.

Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.